Experimental evaluation of Italian language models for large-dictionary speech recognition

نویسندگان

  • M. Codogno
  • Luciano Fissore
  • Alessandro Martelli
  • G. Pirani
  • Giampiero Volpi
چکیده

This paper reports on experiments performed on the ltalian language in order to assess the efficiency of probabilistic language models with reference to a task of large-dictionary speech recognition. Two different types of models, an M -gram and an Mg-gram one, have been investigated for comparison purposes. The quality of the models trained on a corpus of 3.5 million words was measured in terms · of perplexity and of the improvement achieved by integrating the language model in real speech recognition systems. Judging from this empirical measurement, the two language models exhibit equivalent preformance for ltalian, although perplexity measurements would suggest otherwise.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Method for Speech Enhancement Based on Incoherent Model Learning in Wavelet Transform Domain

Quality of speech signal significantly reduces in the presence of environmental noise signals and leads to the imperfect performance of hearing aid devices, automatic speech recognition systems, and mobile phones. In this paper, the single channel speech enhancement of the corrupted signals by the additive noise signals is considered. A dictionary-based algorithm is proposed to train the speech...

متن کامل

Adaptation of Pronunciation Dictionaries for Recognition of Unseen Languages

This paper studies the relative effectiveness of different methods for multilingual model combination and dictionary mapping for recognizing a new unseen target language if training data are limited. We examine the crosslanguage transfer from monolingual and multilingual models to German and Russian language for large vocabulary speech recognition using a dictation database which has been colle...

متن کامل

A Large Vocabulary Continuous Speech Recognition System for Indonesian Language

This paper presents our work to build a pioneering Indonesian Large Vocabulary Continuous Speech Recognition (LVCSR) System. In order to build an LVCSR system, high accurate acoustic models and large-scale language models are essential. Since Indonesian speech corpus was not available yet, we tried to collect speech data from Indonesian native speakers to construct a speech corpus for training ...

متن کامل

Finite-state Transducer Base with Explicit Modeling of Ph

This article describes the design and the experimental evaluation of the first Hungarian large vocabulary continuous speech recognition (LVCSR) system. The architecture of the recognition system is based on the recently proposed weighted finite state transducer (WFST) paradigm. The task domain is the recognition of fluently read sentences selected from a major daily newspaper. Recognition perfo...

متن کامل

Automatic Clinical Speech Recognition for CLEF 2015 eHealth Challenge

In this working notes report/paper, we describe the details of two submissions for CLEF 2015 eHealth challenge for Task 1a, with details of methods and tools developed for automatic speech recognition of NICTA synthetic nursing handover dataset. The first method involves a novel zero-resource approach based on unsupervised acoustic only modeling of speech involving word discovery, and the secon...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1987